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Abstract. Given a series of photographs taken 
during a Go game, we describe the techniques we 
successfully employ for pinpointing the grid lines of 
the Go board and for tracking their small movements 
between consecutive photographs; then we discuss 
how to approximate the location and orientation of 
the observer’s point of view, in order to compensate 
for projection effects. Finally we describe the differ¬ 
ent criteria that jointly form the algorithm for stones’ 
detection, thus enabling us to automatically recon¬ 
struct the whole move sequence. 


1 Introduction 

The identification of a Go position by means of a 
photograph (or a still frame) has been widely dealt 
with in the last 10 years. In theory, the complete 
score of a Go game could be reconstructed if enough 
photos are available: a remarkable feat that could 
prove very useful given the persistent lack of touch- 
sensitive “gobans”.^ While it is true that under ideal 
conditions the identification of a position is not a dif¬ 
ficult problem (nor it is an easy one), things change a 
lot if we try to analyse a whole game, often under bad 
conditions, such as a low point of view, a faint en¬ 
vironment light, the presence of shadows/refiections 
and the continual presence of the players’ hands be¬ 
tween the camera and the goban. Furthermore, the 
game reconstruction should be completed in a matter 
of minutes or even in real time, otherwise the tradi¬ 
tional way (manual compilation of a “kifu”^) would 
still be preferable to an automated task of a sort. 

That’s likely the reason why most attempts to this 
day did not go further the test phase or disappeared 
completely from the Internet after some time.^ The 

^ Goban is the Japanese term for Go board. 

^ A kifu is a pre-printed grid where the game is recorded. 

^ For instance: AutoGoRecorder, Go Game Recorder, Go 
Watcher, GoTracer, ImagelSGF, kifu, Rocamgo. 


first known program capable of detecting the stones 
in a single image was likely CompoGo [2], although it 
needed optimal conditions; GoCam [6] was likely the 
first attempt at analysing a video and was capable of 
detecting the grid, but was never completed; the same 
happened to Saikifu [5], that looked promising, but 
has been given up some years ago. In 2007 Alexan¬ 
der Seewald [10] wrote the most interesting theoreti¬ 
cal analysis of the problem, claiming a high success 
rate on single images, but did not develop a program 
capable of analysing a whole game. In the following 
years many other studies looked promising, but no 
software was ever developed despite some good the¬ 
oretical works: among them only Webcam + Go [9] 
went a bit further, but the author himself admitted it 
could not analyse a whole game. Eventually some¬ 
thing interesting appeared: first Kifu Snap [3], a not- 
freeware program (for Android) capable of correctly 
analysing most single pictures; then Imago [7], a pro¬ 
gram that at last was capable of analysing a whole 
game without making too many errors (its success 
rate is about 76%). Unfortunately it takes about 30 
seconds per photo, according to the author himself. 

Despite a situation not looking very promising, 
with only one program — Imago — really capable 
(albeit slowly) of analysing a whole game, we’ll show 
that it is possible indeed to achieve good results at the 
speed of a fraction of second per move, even under 
less-than-ideal conditions. In section 2 we describe 
how to locate the grid lines of a goban in the first 
photo; thereafter we expound how to swiftly follow, 
in the sequence of the photos, the small displacements 
of the grid caused by movements of the goban and/or 
movements of the camera; then, for each photo, we 
explain how to infer the position and orientation of 
the camera. In section 3 we give full details of how 
to make the most of previously collected information 
and several other criteria to detect the stones put on 
the goban. In section 4 we draw conclusions and out¬ 
line plausible future developments. 



2 Tracking the grid 

2.1 Starting location of the grid 

1 A “difference of Gaussians” filter is applied to 
the first photo of the series, which is then converted 
into a BAV image — one bit per pixel — in order to 
highlight all the visible edges, as shown in figure 1. 



Figure 1: difference of Gaussians and BAV conversion of a 
photo of a goban. The dot in the centre is the pole of the 
polar coordinate system needed in paragraphs 2-6; the unit 
of measure for radial coordinates is marked in the polar axis 
and it coincides with half of the height of the photo (thus the 
diagonal is vfiS units, for full frame digital cameras). 

2 The Hough transform [4] of the BAV image is 
computed (see figure 2). Being the Hough transform 
a very stable and powerful tool to identify lines in 
a picture, it turned out that a complete and onerous 
calculation of the transform is not always required, 
especially for large photos. In order to boost perfor¬ 
mances, a complete calculation is carried out only for 
photos up to about half of a Mpixel; the larger the 
photo the smaller is the percentage of pixels we ac¬ 
tually use to compute the Hough transform, reach¬ 
ing a minimum of 25% for photos of 2.5 Mpixels 
(larger photos would be reduced). Pixels are homo¬ 
geneously selected through a pseudo-random Richt- 
myer pattern [8]. 

3 The “stronger” is a line in the original image, the 
higher is its corresponding peak in the Hough space. 
Thus, the local maxima of the transformed image are 
singled out and sorted: assuming the goban is the 
main subject of the photo, its grid lines should be 
among the highest local maxima. So, in a histogram 
they should form a quite clear “plateau” as in figure 3, 
well above the background noise, which will be dis¬ 
carded in subsequent computations. 

The sorting algorithm used is an adapted version of 
bubble sort: the values to be sorted may be several 
thousands, yet the computational complexity is linear 
as we are only interested in a quite small and preset 
amount of the highest local maxima. 



Figure 2: Hough transform of the image in figure 1. Each 
point in the xy-plane represents a line in the original picture: 
the x-SLxis is the distance of the line from the pole, the y-axis 
is the angle of incline of the line and the z-axis is its Hough 
value. The two sets of almost aligned peaks contain the 
lines of the grid (and the borders of the goban): in the set 
near the lower-right side there are the transverse lines (as 
seen in figure 1), while longitudinal lines are included in 
the set near the upper-left side. 
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Figure 3: histogram of the 128 highest local maxima found 
in the Hough transform shown in figure 2, sorted by de¬ 
creasing values. The maxima that will be discarded as noise 
are drawn in grey, the intermediate plateau contains the lon¬ 
gitudinal lines of the grid in figure 1 and, finally, the highest 
bars on the left include the transverse (and longest) lines of 
figure 1 . The boundary of the plateau has been given a wide 
tolerance to avoid any risk of throwing away useful data. 

4 The grid of any goban is formed by two perpen¬ 
dicular sets of parallel lines. Taking a photo means to 
project those lines into the plane of the camera sensor: 
if the lines are parallel to the sensor their projection 
is a set of parallel lines too, otherwise they are pro¬ 
jected into a set of converging lines. When in step 2 
those sets of lines are transformed into points of the 
Hough space, they lie either on a horizontal straight 
line (former case) or in a sinusoid (latter case). Un¬ 
der at least fair conditions, which we define in sec¬ 
tion 4, the relevant stretch of the sinusoid is either 
very small, compared with its amplitude, or near its 
infiexion point, or both, thus a very good approxi¬ 
mation of a straight line. That justifies the use of a 
second run of the Hough transform (provided a suit¬ 
able tolerance) to identify in the first Hough space the 
two sets of lines of the grid. As the local maxima se¬ 
lected in the previous step are at most a few dozens 
of points, that second transform is fast and more ac¬ 
curate than RANSAC or similar algorithms. 



























































5 Once the two mutually orthogonal sets of parallel 
lines are identified, they are pruned of spurious lines 
(e.g.: the wooden borders of the goban). This task is 
accomplished by evaluating the median distance be¬ 
tween the lines of the set and excluding those lines 
whose placement is worse fitting, even when taking 
the effects of projections into account. 

6 The same above-mentioned expected median dis¬ 
tance is used to interpolate missing lines as well: if 
there is a gap in a set of parallel lines, caused by any 
contingent disturbance in the photo, and if there is at 
least a “low” local maximum in the Hough transform 
where one or more lines are missing, then those lines 
are inserted in the set. 

7 Among the lines selected in the previous steps an 
attempt is made to select the subset of 19 + 19 lines 
most likely forming the grid, minimizing a norm that 
measures the uniformity of the distances between the 
lines. In case of failure another attempt is made with 
13 -f-13 lines only, then 9 + 9. In the highly infrequent 
event that last attempt fails too, the whole procedure 
is aborted and the user has to set the size of the goban 
and to manually pinpoint the four comers of the grid 
in the photo. 

8 If the grid has been found, the lines forming it 
are mutually intersected to compute the placement 
of each grid point. Furthermore, the actual spacings 
between lines are recorded for future use, this be¬ 
ing the rationale: the coordinates of the internal grid 
points could be geometrically computed by knowing 
the coordinates of the four corners alone, but it hap¬ 
pens the grids of most gobans are not perfectly drawn. 
Therefore, using the recorded spacings instead of the 
equidistant ones leads to a more precise correspon¬ 
dence between the actual grid points in the photo and 
the computed ones. 

2.2 Automatic micro-recalibration 

The grid of the goban must be accurately pinpointed 
for each and every photo of the game, yet applying 
every time the algorithm discussed in subsection 2.1 
is not an option, for both practical and theoretical rea¬ 
sons. The former is the sheer computational complex¬ 
ity: even on high-end up-to-date personal computers 
it requires from one to a few seconds to be executed 
(most of the time is spent to calculate the Hough 
transform, depending on size, digital noise and other 
features of the analysed photo). The latter reason is 
the fact that the more the game progresses, the more 
the stones conceal the grid lines: experiments show 
that after 150 moves or so that algorithm rarely suc¬ 
ceeds. On the other hand we can not suppose that the 
grid always remains in the same position inside the 
photos throughout the entire game: even if a stand is 


used, vibrations may occur, players may hit the cam¬ 
era, or the stand, or the goban and so on. 

Provided the corners of the grid are not too close 
to the edges of the photos (a minimum distance of 
approximately the diameter of a stone is required), 
three different variations of the same idea success¬ 
fully solve the issue almost every time. First of all 
we assume that, if a movement really occurs between 
two consecutive photos, the displacement of the im¬ 
age of the grid between those photos is small (up to 
the radius of a stone). This assumption allows faster 
calculations and it is almost automatically fulfilled in 
case of a program dealing with video streams, as sug¬ 
gested in section 4. Then, starting from the first one, 
the procedures described below are applied, each one 
only after the failure of the previous one: if one step 
succeeds the search ends there — when they all fail 
the entire procedure described in subsection 2.1 must 
be run anew. In this way we generally attain the track¬ 
ing of small movements of the grid within a time in 
the order of magnitude of a tenth of a second. 

1 Corner Hough transform 

In a neighbourhood of the last known position of each 
corner of the grid we compute an adapted version 
of the Hough transform, modified to detect comers 
whose sides are nearly parallel to the external lines of 
the latest recognised grid (this is coherent with the 
small displacement hypothesis). Using techniques 
derived from projective geometry, we are able to dis¬ 
card false positives or to detect the new position of the 
grid even if one comer is not recognized or it is hid¬ 
den by a stone: to accomplish that we compute the 
complex cross-ratio of the four corners of the grid, 
using their x and y coordinates in the photo as real 
and imaginary part respectively of a single complex 
number. That complex cross-ratio is preserved by ro¬ 
tations, translations and homogeneous dilations, but 
it is not preserved by real central projections, provid¬ 
ing an useful tool for detecting substantial changes in 
the point of view or errors in the pinpointing of the 
corners. 

This method should work almost to the end of the 
game, when two or more comers may be covered by 
stones, resulting not visible in the photos. 

2 Linear Hough transform 

A fragment of the (standard linear) Hough transform 
is computed in a neighbourhood of the latest known 
position of each external line of the grid — actu¬ 
ally only around those segments of the lines where 
we know there are no stones (but the last played and 
thence not yet detected one). 

As the previous method, this one should work up 
to the last moves of the game too, when a side of 
the grid may happen to be almost entirely concealed 
by stones. For that reason, in order to further im- 


prove time performance (with a negligible worsening 
of success rate), it may eventually be discarded. 

3 Elliptic Hough transform 

A fragment of the Hough transform for elliptic shapes 
is calculated in a neighbourhood of each known stone 
placed in the external lines of the grid; the positions 
of the lines are thereby deduced through a linear re¬ 
gression applied to the coordinates of the centres of 
the recognised stones, taking into account both pro¬ 
jective distortions and systematic misplacements of 
the stones. 

This method is not applicable in the first part of 
the game, a phase when usually there are almost no 
stones in the external lines, but it becomes more and 
more accurate towards the end of the game. 

2.3 Approximation of the point of view 

Even if a stone is perfectly placed upon a grid point, 
which hardly ever happens (but can be assumed as 
the mean placement), the projection of its geometri¬ 
cal centre does not coincide with the grid point in the 
photo, unless the camera is on the “point at infinity” 
of the normal line to the plane of the goban, which 
obviously never happens. The discrepancy is substan¬ 
tial for the stones in the furthest lines of the grid, es¬ 
pecially if the photos are taken from a low point of 
view, nevertheless it is non-negligible even in photos 
shot under good conditions, as shown in figure 4. 



Figure 4: discrepancy between the computed grid points 
(the white crosses where there are no stones, the lower dot 
where there are) and the computed projection of the ge¬ 
ometrical centre of the stones (the upper dot), the latter 
clearly being a better approximation of the actual location 
of the stones, even the misplaced ones. 

Being the goban fiat, the knowledge of its position 
in the photo gives no information about the third di¬ 
mension and the orientation of the z-axis in each grid 
point. In order to obtain those information it is nec¬ 
essary to approximate the position and orientation of 
the camera actually taking the photos and to use those 
data to calculate the projection into the plane of the 
image sensor of any point of the space which we are 
interested in. The problem of finding the point of 
view is typically resolvable when two or more pic¬ 
tures of the same scene are available, for instance 
using techniques of epipolar geometry, or when it is 
possible to implement an interactive search, as shown 


in [1]. Unfortunately, none of those approach are ap¬ 
plicable in our case, as we need the point of view 
for each photo separately. Furthermore, according to 
Hadamard’s definition, the general problem of find¬ 
ing the point of view of a single image is ill-posed. 

What we can exploit to solve it is the a priori 
knowledge that we are observing the projection of an 
approximately square-shaped object and that we are 
indeed only interested in shapes, not in absolute di¬ 
mensions (i.e.: we do not need to know how big the 
goban is or how distant it is from the camera). So we 
set a relative unit of measure (half of the longitudinal 
side of the grid, as seen by the camera) and an origin 
for the Cartesian coordinate system (the “tengen”"^), 
with x-axis and y-axis parallel to the lines of the grid 
and z-axis pointing upwards. Like any rigid body, 
the camera has six degrees of freedom: the three co¬ 
ordinates of the nodal point N of the lens (which is 
the actual point of view), the orientation of the cam¬ 
era (which is defined by the two coordinates of the 
point G at which the camera is aimed in the plane of 
the goban) and finally the angle of roll of the camera 
along the line NG (which is zero if the camera is level, 
non-zero if the camera is leaning). Direct geometri¬ 
cal calculations allow to obtain the vanishing points 
of the lines and diagonals of the grid, which in turn 
are used to compute the horizon and then the angle of 
roll of the camera. Moreover, the coordinates of G are 
evaluated by applying real cross-ratios and, finally, it 
is possible to infer the equation of the vertical plane V 
containing N from the intersections among the lines 
of the grid and the line perpendicular to the horizon 
passing through G. 

That completes the list of quantities immediately 
and explicitly deducible from the analysis of the 
photo. Two more coordinates remain to be approx¬ 
imated: those of N in v, for which we use an algo¬ 
rithm derived from both the shooting method and the 
bisection one used in numerical analysis. 

1. Apart from some handmade amateurish gobans, 
for aesthetic reasons the grid is not square: to 
compensate for perspective effects the ratio be¬ 
tween its sides is about 1.077 (for Japanese and 
Korean gobans) or 1.038 (for Chinese gobans). 
Since the dilation along a single axis is not a 
projection and since the longest side of the grid 
may be either the longitudinal or the transverse 
one in the photo, repeat steps from 2 to 5 for 
re {1.077,1^, 1.038, 

2. Let (po = 45° the angular coordinate of a point 
No in the polar coordinate system of V with pole 
G and horizontal polar axis; let po = 5 its radial 

^ Tengen is a Japanese Go term literally meaning “origin of 
heaven”: it denotes the central point of the grid. 
























coordinate (with the same unit of measure of the 
main Cartesian system, thus 5 units are equiv¬ 
alent to 2.5 times the length of the longitudinal 
lines of the grid). 

3. Starting with ^ = 0 and using as point of view, 
project a rectangle whose comers are the points 
(±1, ±C0) into the plane containing G and per¬ 
pendicular to NG, thus obtaining a projection 
homothetic to the one lying in the plane of the 
image sensor — whose location is unknowable. 

4. Slightly altering the values of and (p^, evalu¬ 
ate the projective convergence of opposite lon¬ 
gitudinal sides of the grid and the angle between 
the main diagonals. Hence, comparing those an¬ 
gles to the actual ones in the photo, for increas¬ 
ing values of ^ G N accordingly set the new val¬ 
ues of the couple 

ipM = Pk + ^pk 

and iterate this and the previous step until 
converges (with quickly decreasing values for 
Ap^ and A(p^ convergence is guaranteed). 

5. Uniformly scale the last computed grid to best fit 
the one in the photo and calculate the maximum 
distance between each corner of the grid and the 
actual corners found in the photo. 

6. Chose the value of t that minimises the distances 
of step 5, thus defining the most likely kind and 
orientation of the goban and — as a by-product 
— the point of view (see figure 5). 



Figure 5: the leaning central black cross marks the 
point where the camera is aimed at; computed values for 
this photo are: A(3.789,2.072,3.667), G(0.207,0.072), 
angle of roll = —3.34° and t = 1.038 (Chinese goban 
placed transversely). The three-dimensional orthonormal 
basis is drawn in white, as well as the vertical half-versors 
in each corner of the grid. 


3 Detecting the stones 

3.1 From theory to practice 

As previously pointed out, detecting stones on a 
goban is not a difficult task given optimal condi¬ 
tions, but real games are another matter and a lot 
of problems invariably arise. We have already dis¬ 
cussed what happens when the table is bumped or 
the camera vibrates, forcing an automatic recalibra¬ 
tion of the grid, but most issues emerge because good 
conditions, such as in figure 6, are uncommon. In 
figure 7 we see instead which conditions are typical 
of real games, during which there is no control over 



Figure 6: good conditions. 



Figure 7: poor conditions 
(low point of view, player’s 
arm visible). 


the point of view — standing too close would dis¬ 
turb the players — neither on the light, both factors 
being crucial for the detection process. The players 
themselves, who usually pay little care to the cam¬ 
era, may cause further problems, making it difficult 
to take pictures without some fingers, hands and even 
whole arms covering many, or most stones. Also, in 
the heat of the game it’s not uncommon to skip some 
pictures or, on the contrary, to take more than one of 
a single move. 

Fixing these latter mistakes is easy (for example, if a 
photograph is skipped the last stone put on the goban 
will be certainly missed, but, together with the next 
one, could be detected anyhow in the following pic¬ 
ture), but the real challenge is detecting the stones un¬ 
der poor conditions, such as the ones depicted above. 
Unless error-proof algorithms were employed, such 
conditions could result in a frustrating series of errors 
that at best would transform the analysis in a slow, 
painful process, and at worst could even go unno¬ 
ticed, turning the analysis into a disaster, with a likely 
wrong score. 

To clarify the extent of such problems, let’s just men¬ 
tion that during the ‘T1 David” Tournament, held in 
Florence in December 2012, we recorded a game by 
means of 269 photographs, each one taken after a 
single move. Of these photographs, 65 — 1 out of 
4, and up to 6 consecutive — were affected by “dis¬ 
turbance”, such as fingers/hands/arms of the players. 
Also, dark shadows were often cast on the goban, fur¬ 
ther affecting stone detection. Without error-proof al- 

















gorithms, capable of detecting the stones no matter 
what, any automatic reconstruction of such a game 
would inevitably fail and, instead of losing a lot of 
time trying to get over the continual errors thrown by 
simple algorithms, it would be better to switch to a 
manual inspection of the pictures, in order to at least 
reconstruct the game by hand. 

3.2 Detecting all stones vs. detecting last 
stone 

Before choosing an algorithm, it’s important to de¬ 
cide which one of the following two approaches is 
the best: the “traditional”, implemented by Kifu Snap 
and Imago, for example, that is starting from scratch 
on every picture in order to locate the grid and ev¬ 
ery single stone on the goban, or the “minimal”, that 
is relying heavily upon data collected from previous 
pictures and trying to detect only the last stone put on 
the goban. We found the second approach to be the 
best one, at least when the goal is the reconstruction 
of a whole game and not just the identification of a 
specific position. There are at least four reasons to 
support such a choice: 

1 Just by looking at figure 7 it is clear there is no 
way to guess which stones may be hidden by the “dis¬ 
turbance” (a whole arm), no matter how good the 
algorithm employed is. The “traditional” approach 
would always fail, while the “minimal” one, were the 
last stone put on the goban still visible, would likely 
succeed, at least in theory. 

2 The second approach, having just one stone to 
detect, is much faster than the first one. This is 
not immediately clear, as every intersection has to 
be checked, no matter how many stones have to be 
found. But as the second approach relies on data 
collected from previous pictures, it knows where the 
empty intersections are and may limit its searching to 
them, thus reducing the time required by the calcula¬ 
tions; also, knowing for certain that all the intersec¬ 
tions but one are empty, it can examine thoroughly 
only the most promising ones, discarding the others 
after a rough examination. That’s why Imago, the 
program mentioned before, takes about 28 seconds 
per move (although most of the time is needed for 
locating the grid), as it makes use of the first, “tradi¬ 
tional” approach. The second, “minimal” approach, 
could require only a fraction of second per move. 

3 Although not so obvious at first, looking for all 
the stones is more error-prone than looking for just 
one. For example, if the first approach employs an al¬ 
gorithm with a known success rate of 99% — mean¬ 
ing that, for every intersection, it will correctly guess 
99% of the times if it’s empty or if a stone is there 
— it would seem, at first, that the outcome will be 


very good. But on closer scrutiny it becomes clear it 
is by no means good: as there are 361 intersections 
on a standard goban, the algorithm will fail, on aver¬ 
age, on 3.61 of them and for each picture the chance 
of turning out error-free will be just 0.99^^^ = 2.7%. 
This means that almost all pictures in a game will be 
affected by errors, most of the times multiple ones, 
and no automatic reconstruction will ever be possi¬ 
ble. Even a 99.9% success rate would not suffice, as 
just about 70% of pictures will be error-free, a per¬ 
centage not high enough.^ That’s why Imago, despite 
claiming a remarkable success rate of 99.86%, fails 
on 24% of its test pictures (13 out of 55).^ 

If we make use of the “minimal” approach instead, 
we could employ two algorithms, as hinted before: 
one to select the most promising intersections, the 
other to analyse in depth these last ones. Let’s now as¬ 
sume the second algorithm could claim, once again, a 
success rate of 99% — being possibly the same algo¬ 
rithm employed before —; let’s also assume ten or so 
stones must be selected to allow the first algorithm an 
equal success rate of 99% — meaning now that 99% 
of the times the stone we’re looking for will indeed 
be found on one of these intersections. If the assump¬ 
tions were correct, the chance of each picture of turn¬ 
ing out error-free will be 0.99^^ = almost 90%: a very 
good result despite the same success rates proved in¬ 
adequate when making use of the first approach. Fur¬ 
thermore, according to our tests only two or three in¬ 
tersections have usually to be selected to let the first 
algorithm achieve a success rate of 99%: selecting 
ten will further increase the overall efficiency of the 
“minimal” approach. 

4 The “minimal” approach makes things easier, be¬ 
cause locating the stones and avoiding false nega¬ 
tives/positives is much simpler. For example, a sim¬ 
ple way of locating the stones is to compute the lumi¬ 
nance of the intersections, relying on the likely equiv¬ 
alence: high luminance ^ a white stone; low lumi¬ 
nance ^ a black stone; in the middle, it’s empty. A 
simple procedure could be: 

• computing, for every intersection, the value of its 
luminance, averaging that of many pixels around 
the centre; 

• establishing a “high threshold” and a “low thresh¬ 
old” by means of test pictures; 

• if luminance of the intersection > high 
threshold, a white stone is there; if 
luminance < low threshold, a black stone is 
there; if luminance in between, it’s empty. 

^ It requires an unbelievable 99.99% success rate in order to 
lower the picture failure rate to a reasonable 3.5%. 

^ That’s better than the expected percentage of 40% because 
the test pictures were not chosen randomly, resulting in a more 
compact error distribution. 



This procedure usually works well enough. But noth¬ 
ing guarantees these same thresholds will work on 
the next pictures, as a small change in the light could 
easily alter the luminance distribution. And in other 
games the situation could get even worse. For exam¬ 
ple, the picture in figure 6 has an ‘high threshold” at 
215 (out of 255, as the values are expressed by means 
of unsigned bytes) and a “low threshold” at 100, but 
if the same values are applied on the picture in fig¬ 
ure 8 weTl miss, albeit by a small margin, the stone 


(pointed by the black arrow in figure 8), is much eas¬ 
ier to find than all the peaks on the left. It would 
suffice to pick the smallest value of the distribution 
function to find the stone, something not feasible on 
the other one, where the same peak is not the most 
prominent. And although most peaks may easily be 
identified, some may not — the drawback of having 
so many to deal with — and require further analysis 
to rule out any false positive and, of course, not to 
miss any stone. 



Figure 8 


3.3 Choosing an algorithm 

After having chosen the “minimal” approach (search¬ 
ing for the last stone only) over the “traditional” one 
(searching for all the stones), weTl now describe 
which criteria are best suited for finding the stones 
and why, then how to combine them, and eventually 
which use to make of the “combination”. 

3.3.1 The criteria 


pointed out by the white arrow (a false negative) and 
mistake the empty intersection marked by the circle 
for a black stone (a false positive). 

Improving the algorithm does not work: let’s try, for 
example, to dynamically set the thresholds, comput¬ 
ing first the mean luminance of the whole goban, then 
adding/subtracting a fixed percentage. In such a case 
the picture in figure 6 needs a “high threshold” of 
33% (that is, the luminance of white stones is always 
greater than the mean luminance of whole goban 
33%), that is not enough to detect, again, the stone 
pointed out by the white arrow in figure 8. The two 
luminance distributions are simply too much apart. 

To solve the problem once and for all, two things are 
needed: first, finding more criteria to combine to¬ 
gether; second, restricting the analysis to the empty 
intersections only, in order to get rid of the mislead¬ 
ing contribution of the stones. And that’s why the 
“minimal” approach is needed. 

In order to better understand this point, let’s have a 
look at the luminance distribution of the picture in 
figure 8, first considering all the intersections, then re- 



250 

200 

150 

100 

50 

0 


wt 






1 61 121 181 241 301 342 


Figure 9: luminance distribution of figure 8 intersections. 

moving all the stones except for the last one put on the 
goban: it’s quite obvious that the single peak on the 
right, which matches the last stone put on the goban 


1 Difference between pictures 

This criterion is simple: without “disturbance”, two 
consecutive pictures of the same game should be 
identical except for the last stone put on the goban. ^ 
Computing the difference between a picture and the 
previous one should immediately highlight the last 
stone only. In figure 10 we see an example of such 
an operation, with a really good outcome. 

The pros: it’s a fast operation and it’s very easy to 
implement. In absence of disturbance it works very 
well, with reliable outcomes. 

The cons: it’s extremely sensitive to disturbance, al¬ 
beit small; furthermore, as any disturbance will also 
affect the difference with the next picture, it will be 
necessary to choose the lesser evil: either dealing 
with another dubious outcome or discarding the pic¬ 
ture entirely, then losing valuable information. 



Figure 10: difference between consecutive pictures: only 
the last stone put on the goban stands out clearly. 

2 Analysis of local features 

We’ve already discussed the luminance, which is one 
of these features, albeit not a very useful one, as it’s 

^ The difference would also extend to the stones possibly cap¬ 
tured but, in such a case, could be neglected, as the “minimal” 
approach only evaluates empty intersections. 



































































sensitive not only to disturbance, but also to changes 
in the light that could impose on the thresholds em¬ 
ployed. Another feature is the so-called “chromi¬ 
nance”, that is the standard deviation of the RGB 
components around an intersection: as the stones 
are white or black while the goban surface is usu¬ 
ally yellowish, computing an intersection’s chromi¬ 
nance would tell if it is colourful — then empty — or 
some shade of grey — then likely white or black, thus 
covered by a stone. With a truly yellowish goban, 
chrominance analysis is very reliable, as shadows and 
reflections would never turn a colourful intersection 
into a greyish one; but this could happen anyhow un¬ 
der certain lights, making chrominance completely 
worthless, or even detrimental. Also, any major dis¬ 
turbance would have the same effect. 

Other potential features are hue and saturation (first 
two components of the HSL colour system), both 
fully capable of highlighting the stones; furthermore, 
the hue is insensitive to shadows, even dark ones, 
while saturation is not affected by reflections. Unfor¬ 
tunately the opposite is also true: hue highlights any 
reflections, even faint ones, while saturation is sensi¬ 
tive to shadows. 

The last feature is the so-called “uniformity”. While 
the stones’ surface look uniform, the goban’s does 
not, because the intersections are crossed by the grid 
lines, which produce a local disturbance that disrupts 
the surface uniformity. A careful research of such 
small disturbance could possibly tell apart the empty 
intersections from the stones and even circumvent 
major disturbance. Problems arise on the borders, 
especially the corners, where some lines are miss¬ 
ing, and in some kind of disturbance (dark, uniform 
sleeves, for example). 

The pros: all the features are easy to compute 
and, under good conditions, produce good outcomes. 
Some are also fast to compute. 

The cons: all the features are sensitive to disturbance 
and cannot be trusted alone; they need to be com¬ 
bined in a complex formula. Some are not fast to 
compute (uniformity, chrominance). 

3 Circular/elliptical Hough transform 

The Hough transform, as previously discussed, is a 
mathematical process capable of highlighting specific 
features, such as lines, circles and so on, if present in 
the pictures. It could easily be employed for locating 
the stones, which look like small ellipses scattered on 
a flat surface. The process creates “signals” in the 
Hough space, each one matching one of the features 
we’re looking for: a graphical depiction of these sig¬ 
nals is shown in figure 11, with the picture to inspect 
on the left and the relative Hough space on the right, 
after the application of the circular transform: the 
stronger a signal (the white circles) the more likely 



Figure 11: the outcome of the circular transform with a 
good point of view: strong signals matching the stones, and 
the stones only. 

a stone will be found over the matching intersection. 
At first, it looks like this process could solve the 
stones’ detection problem once and for all, as the sig¬ 
nals are strong where a stone is indeed placed over an 
intersection, weak otherwise, and even major distur¬ 
bance, unless resembling small circles, won’t affect 
the outcome. But on closer scrutiny figure 11 shows 
that some of the weak signals are not so weak after 
all: for example, there are many on the goban’s upper 
side. This is not a nuance: as the stones look more 
elliptical than circular, the further they are from the 
observer or the lower the point of view, the more dif¬ 
ficult it is for the transform to detect them. In order to 
bypass the problem, an elliptical transform could be 
employed, but the time required would grow, becom¬ 
ing unsuitable for real-time analysis. Furthermore, 
the outcome of the transform is influenced by nearby 
stones, especially off-centre ones; some disturbance 
(for example fingertips) and even shadows cast by the 
stones themselves could also alter the signals. 

The pros: with a good point of view the outcome is 
always good, almost insensitive to disturbance. 

The cons: it’s a slow process; if the point of view is 
bad the outcome becomes erratic; sensitive to some 
kind of disturbance not affecting other criteria. 

3.3.2 The algorithm 

After a careful evaluation of the criteria pros and 
cons, we gave up the slow Hough transform, instead 
relying heavily upon the “difference between pic¬ 
tures”. We also made use of all the local features 
discussed before and step by step the success rate im¬ 
proved, eventually reaching a limit when the “short 
blanket” effect appeared: any variation of the thresh¬ 
olds that could, for example, decrease the number of 
false negatives, made new false positives arise, and 
vice versa. A 100% success rate was indeed achieved, 
but only under good conditions and no disturbance: 
under fair conditions it did not exceed 98-99% and, if 
disturbance was present, never went beyond 25-30%. 
This was by all means a remarkable feat, but there 
was no real improvement over the past attempts, the 

























































ones we listed in the introduction: the advantage of 
the “minimal” approach was clear, still not decisive. 
A breakthrough was eventually made when a friend 
of us, the renowned amateur mathematician Dani Fer¬ 
rari, M.Sc., especially known for his work on al- 
phametics, recommended a new technique in order 
to make the most of the “minimal” approach. He ob¬ 
served that most of the times it was not possible to 
tell apart disturbance from stones and suggested to 
examine all the intersections, not just the empty ones, 
in order to gather information about “how the stones 
look like”. This meant sorting the intersections in 
three groups, empty, covered by a white stone, cov¬ 
ered by a black stone (using data from previous pic¬ 
tures), and computing the local features’ mean val¬ 
ues for each one of them; after that, a comparison 
between these mean values and those of an empty in¬ 
tersection should have disclosed the nature of the lat¬ 
ter. A small difference with the “mean white stone”, 
for example, together with a big difference with the 
“mean empty intersection” would have meant that a 
white stone was over the intersection under scrutiny. 
The idea proved correct: after checking promising 
but unsure intersections by means of the Hough trans¬ 
form, a success rate of 100% was eventually achieved 
even when disturbance was present, given the last 
stone put on the goban were fully visible. Even under 
fair conditions the success rate was close to 100%, 
with of without disturbance. 

This algorithm proved so good mostly because of two 
reasons: 

1 Although a disturbance does not look like a stone, 
it’s difficult to tell apart the one from the other if we 
only stick to empty intersections. Something else is 
needed in order to decide where the differences lie 
and help from previously located stones is crucial. 
For example, let’s see what happens employing this 
algorithm on the picture in figure 12: the distribu- 



Figure 12: distribution of differences between empty inter¬ 
sections and average white stone (top), average black stone 
(middle), average empty intersection (bottom). 

tions on the right show the differences between the lo¬ 


cal features’ mean values for each empty intersection 
and the corresponding mean values of the “average” 
white stone, the “average” black stone, the “average” 
empty intersection respectively. Some intersections 
don’t look “like empty” (due to the shadow’s distur¬ 
bance, highlighted by a circle), but only one of them 
also looks “like black”, while none at all looks “like 
white”. That’s more than enough to detect the last 
stone put on the goban, of course a black one, pointed 
(together with the corresponding peaks) by an arrow. 

2 We stated before that the thresholds needed by 
the “minimal” approach make things easier, as iso¬ 
lated peaks can be immediately detected and we need 
not to determine “how big” these peaks should be. 
But these thresholds are still sensitive to variations in 
the light or the point of view, disturbance and so on 
— hence the limit in the success rate. In the new tech¬ 
nique we still need to define “how similar” (to those 
of a stone) or “how apart” an intersections’ local fea¬ 
tures must be in order to identify a stone, but these 
parameters are not affected by light, disturbance and 
so on, thus making it possible to set them once and 
for all — hence the 100% success rate. 

3.3.3 The details 

1. the intersections are sorted into three groups: 
empty, white stones, black stones (stone posi¬ 
tions, except for the last one put on the goban, 
are known from previous pictures). 

2. for each intersection, the local features values 
are computed: 

• around each intersection a circle is scanned, 
its radius being roughly 1 /5 of the size of a 
stone, its centre shifted to compensate for pro¬ 
jection effects; the area around the real centre, 
where the grid lines intersect, is let out (see 
figure 13) as it contains a lot of dark pixels 
that reduce the intersection’s luminance and 
chrominance, making it dangerously similar 
to a black stone. The RGB, the hue and the 
saturation values of all the remaining pixels 
are then averaged. 

• the intersection’s luminance L is then com¬ 
puted by means of the known formula 

L = 0.299R + 0.587G + 0.1145, 

while the chrominance C is the standard devi¬ 
ation of the RGB values. Hue H and satura¬ 
tion are usually part of the pixel attributes and 
do not require to be computed. 

• starting in the intersection’s centre and end¬ 
ing after a complete turn, an upward spiral 
is swept. Its diameter is about 1/8 of that 






























































of a stone (see figure 14). For each pixel 
along the spiral the “distance” from the pre¬ 
ceding one is computed by means of the for- 
mula y(AL)2 + (AC)2. 

If the sum of the “distances” — that we call 
“disuniformity” D — is high, it’s likely be¬ 
cause around the intersection the grid is fully 
visible (hence the intersection is probably 
empty); if it’s low, the grid is likely not vis¬ 
ible (hence, a stone is probably there). ^ 



Figure 13: the areas used 
for computing the intersec¬ 
tions’ local features. 



Figure 14: the spirals used 
for computing the intersec¬ 
tions’ disuniformity. 


3. for each group, the values of the intersections’ 
local features are averaged. We define these val¬ 
ues as the ones pertaining to the “mean empty 
intersection”, the “mean white stone”, the “mean 
black stone”. 

4. for each empty intersection, the differences be¬ 
tween the values of its local features and the 
values of the “mean empty intersection”, “mean 
white stone” and “mean black stone” are com¬ 
puted, then merged together by means of the fol¬ 
lowing formula: 


empty intersection” (if true, that means the 
intersection corresponding to the lowest 
value looks much more like a stone than 
an empty one). 

(b) it does not exceed 2/3 of the mean value 
of the following 15 intersections, assuming 
the function has been sorted from lowest 
to highest ( if true, that means this intersec¬ 
tion and the ‘'mean white/black stone ” are 
much alike). 

(c) its discrete derivative is at least 6 times 
higher than the mean derivative of the fol¬ 
lowing 15 intersections (if true, that means 
it's a peak, hence likely a stone). 

7. if all three conditions are met, a stone is likely 
to be found over the intersection corresponding 
to the function’s lowest value. This is double- 
checked by means of the Hough transform: if a 
high value is returned, the stone is assumed to be 
really there (if a low value is returned instead, it 
is assumed to be a false positive). 

8. if only the first condition is met, the intersection 
is again double-checked by means of the Hough 
transform. If a very high value is returned, a 
stone is assumed to be there (thus avoiding a 
false negative). 

9. usually either a white stone or a black stone is 
found, but both functions are checked nonethe¬ 
less; if the whole process is repeated on the 
second to lowest value of both functions other 
stones could be detected (useful when some pic¬ 
tures are missing or have been discarded). 


y^(AZ)27(Aq2 + (AZ))2 + (A^, 

thus reducing them to only one (at present, satu¬ 
ration is computed but not employed). 

5. eventually three discrete functions are built, 
with domain the set of empty intersections and 
codomain the differences with the “mean empty 
intersection”, the “mean white stone”, the “mean 
black stone” respectively, computed by means of 
the formula above (see figure 12). 

6. the lowest value in the first/second function, 
“difference with mean white/black stone”, is 
checked against three conditions: 

(a) it does not exceed 2/3 of the correspond¬ 
ing value of the difference with the “mean 

^ A spiral is needed because the shift cannot be applied to a 
circumference: were it employed, problems would appear on the 
upper border’s intersections where the grid would be missed, thus 
spawning a low disuniformity whether or not stones were present; 
were it not, projection effects could not be neglected. 


4 Conclusion 

We performed some tests to evaluate the success rate 
achieved by the algorithm under different conditions. 
After a friendly game in summer 2012, we recorded 
and analysed some others: first, in December 2012, 
three games played in the Florence “11 David” Tour¬ 
nament, then Imago's first test game^ and eventually, 
in March 2015, two games played in the Pisa Interna¬ 
tional Go Tournament, one of which was also filmed 
in view of the video analysis we hint at later on. In the 
first table we present the test results; by the way, with 
only seven test games the exact success rate of the al¬ 
gorithm cannot be determined, and that’s why, after 
a careful scrutiny of the results and the more critical 
pictures, we estimated an approximate rate presented 
in a more concise table. Our conclusions were: 

^ The second one was deemed unreliable because too many 
stones were off-centre (more than 10) and the automatic grid re¬ 
calibration had to be turned off, as one corner of the grid was too 
close to the pictures’ borders. 








Game 

Disturbance 

Stone 

off 

centre 

Missing 

pictures 

Duplicate 

pictures 

Moves 

played 

none 

stone fully 
visible 

Stone partly 
visible 

stone 

hidden 

Corsolini-Carta 
(friendly game) 

291 

(100.0%) 291 

6 

(100.0%) 6 

1 

(100.0%) 1 

0 

0 

1 

1 

299 

Grazzini-B evegni 
(Florence) 

189 

(100.0%) 189 

46 

(100.0%) 46 

5 

(60.0%) 3 

12 

1 

9 

5 

262 

Pace-Zingoni 

(Florence) 

239 

(100.0%) 239 

34 

(100.0%) 34 

4 

(0.0%) 0 

0 

0 

6 

0 

283 

De Lucia-Pace 
(Florence) 

204 

(99.5%) 203 

27 

(100.0%) 27 

3 

(33.3%) 1 

6 

0 

6 

0 

246 

Zingoni-Shakhov 

(Pisa) 

176 

(99.4%) 175 

19 

(94.7%) 18 

4 

(25.0%) 1 

2 

0 

4 

3 

205 

Pignelli-Albano 

(Pisa) 

160 

(97.5%) 156 

54 

(94.4%) 51 

4 

(75.0%) 3 

0 

0 

15 

0 

233 

Imago test 1 

224 

(100.0%) 224 

15 

(66.7%) 10 

0 

0 

3 

0 

2 

242 


Table 1: results. For each game: on the first rows the number of pictures examined, sorted by problem typology; on the second 
rows the pictures in which the last stone played was correctly detected. Each dataset also includes a photo of the empty goban. 


• The success rate strictly depends on the goban 
size (in pixels) and the elevation angle of the 
point of view. We classified these conditions as 
shown in the following image: 



Figure 15: 1 Corsolini-Carta, 2 Grazzini-Bevegni, 3 Pace- 
Zingoni, 4 De Lucia-Pace, 5 Imago tests, 6 Zingoni- 
Shakhov, 7 Pignelli-Albano. Complete datasets are avail¬ 
able from http: //www. oipaz. net/PhotoKifu. html 

• Preconditions: the stones must be where they’re 
supposed to be (stones entirely off-centre cannot 
be detected) and must be visible. 

• It’s difficult to evaluate the success rate when the 
stones are only partly hidden. Our estimate is 
based on the test results as well on a careful ex¬ 
amination of the few relevant cases. 

• No games were played under optimal condi¬ 
tions, but the success rate can be easily inferred. 
Poor conditions cannot be evaluated because 
“poor” can mean anything, from “almost fair”, 
as in the Imago test games that indeed presented 
a good outcome, to completely useless pictures. 


Global 

conditions 

Disturbance 

none 

stone fully 
visible 

stone partly 
visible 

optimal 

100% 

100% 

70%-80% 

good 

100% 

100% 

60%-70% 

fair 

99%-100% 

95%-100% 

50%-60% 

poor 

not evaluated 


Table 2: success rate per move of the algorithms. 


As the above table shows, the algorithm works very 
well and it’s extremely fast, as each picture required 
about 0.2 seconds to be processed, plus some nec¬ 
essary pre-processing^^ (resizing and converting to 
BAV in order to make use of the Hough transform). 

The test games were analysed by means of a freely 
distributed program, PhotoKifu}^ built around the 
algorithms previously discussed. Yet some aspects of 
the whole process will be further improved in order to 
achieve the ultimate goal, that is a real time automatic 
analysis of a Go game by means of a video stream. 
For example, the goban tracking algorithm will have 
to detect larger movements, as we observed that play¬ 
ers’ fidgeting, combined with tables’ swinging, often 
proves too much for the current procedure; we also 
noticed that stones are played off-centre more than 
expected, then exploiting the detection technique’s 
only weakness, which needs to be dealt with. The 
new program will be called VideoKifu and will likely 
establish a new standard for recording Go games. 


The required time depends on the image processing library 
employed: with OpenCV, for instance, is almost negligible, 
http://www.oipaz.net/PhotoKifu.html 
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Appendix [Colour images for online publication] 



Figure 16: enlarged colour version of figure 2. 



Figure 17: an example of what is discussed in paragraph 3.3.1.2. Same goban, different light: on the left, natural light (the 
surface looks colourful), on the right, artificial (not only the surface looks grey, but the white stones look more colourful). 



Figure 18: on the left a barely detectable off-centre stone, on the right a stone so off-centre it cannot be detected (see section 4). 




















































































































































